How to normalize metatranscriptomic count data for differential expression analysis

نویسندگان

  • Heiner Klingenberg
  • Peter Meinicke
چکیده

BACKGROUND Differential expression analysis on the basis of RNA-Seq count data has become a standard tool in transcriptomics. Several studies have shown that prior normalization of the data is crucial for a reliable detection of transcriptional differences. Until now it has not been clear whether and how the transcriptomic approach can be used for differential expression analysis in metatranscriptomics. METHODS We propose a model for differential expression in metatranscriptomics that explicitly accounts for variations in the taxonomic composition of transcripts across different samples. As a main consequence the correct normalization of metatranscriptomic count data under this model requires the taxonomic separation of the data into organism-specific bins. Then the taxon-specific scaling of organism profiles yields a valid normalization and allows us to recombine the scaled profiles into a metatranscriptomic count matrix. This matrix can then be analyzed with statistical tools for transcriptomic count data. For taxon-specific scaling and recombination of scaled counts we provide a simple R script. RESULTS When applying transcriptomic tools for differential expression analysis directly to metatranscriptomic data with an organism-independent (global) scaling of counts the resulting differences may be difficult to interpret. The differences may correspond to changing functional profiles of the contributing organisms but may also result from a variation of taxonomic abundances. Taxon-specific scaling eliminates this variation and therefore the resulting differences actually reflect a different behavior of organisms under changing conditions. In simulation studies we show that the divergence between results from global and taxon-specific scaling can be drastic. In particular, the variation of organism abundances can imply a considerable increase of significant differences with global scaling. Also, on real metatranscriptomic data, the predictions from taxon-specific and global scaling can differ widely. Our studies indicate that in real data applications performed with global scaling it might be impossible to distinguish between differential expression in terms of transcriptomic changes and differential composition in terms of changing taxonomic proportions. CONCLUSIONS As in transcriptomics, a proper normalization of count data is also essential for differential expression analysis in metatranscriptomics. Our model implies a taxon-specific scaling of counts for normalization of the data. The application of taxon-specific scaling consequently removes taxonomic composition variations from functional profiles and therefore provides a clear interpretation of the observed functional differences.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Normalization of metatranscriptomic and metaproteomic data for differential gene expression analyses: The importance of accounting for organism abundance

Normalization of metatranscriptomic and metaproteomic data for 1 differential gene expression analyses: The importance of accounting 2 for organism abundance 3 4 5 Author: Manuel Kleiner 6 7 Affiliations: 8 Energy Bioengineering and Geomicrobiology Group, Department of Geoscience, University of Calgary, 9 Calgary, Canada 10 Department of Plant and Microbial Biology, North Carolina State Univers...

متن کامل

Package 'tcc' Title Tcc: Differential Expression Analysis for Tag Count Data with Robust Normalization Strategies

April 26, 2017 Type Package Title TCC: Differential expression analysis for tag count data with robust normalization strategies Version 1.16.0 Author Jianqiang Sun, Tomoaki Nishiyama, Kentaro Shimizu, and Koji Kadota Maintainer Jianqiang Sun , Tomoaki Nishiyama Description This package provides a series of functions for performing dif...

متن کامل

Title Tcc: Differential Expression Analysis for Tag Count Data with Robust Normalization Strategies

December 22, 2016 Type Package Title TCC: Differential expression analysis for tag count data with robust normalization strategies Version 1.14.0 Author Jianqiang Sun, Tomoaki Nishiyama, Kentaro Shimizu, and Koji Kadota Maintainer Jianqiang Sun , Tomoaki Nishiyama Description This package provides a series of functions for performing ...

متن کامل

TCC: Differential expression analysis for tag count data with robust normalization strategies

The R/Bioconductor package, TCC, provides users with a robust and accurate framework to perform differential expression (DE) analysis of tag count data. We recently developed a multi-step normalization method (TbT; Kadota et al., 2012 [3]) for two-group RNA-seq data. The strategy (called DEGES) is to remove data that are potential differentially expressed genes (DEGs) before performing the data...

متن کامل

Identifying stably expressed genes from multiple RNA-Seq data sets

We examined RNA-Seq data on 211 biological samples from 24 different Arabidopsis experiments carried out by different labs. We grouped the samples according to tissue types, and in each of the groups, we identified genes that are stably expressed across biological samples, treatment conditions, and experiments. We fit a Poisson log-linear mixed-effect model to the read counts for each gene and ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 5  شماره 

صفحات  -

تاریخ انتشار 2017